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Abstract 

This paper proves that an "old dog" , namely ~ the classical Johnson-Lindenstrauss transform, 
"performs new tricks" - it gives a novel way of preserving differential privacy. We show that 
if we take two databases, D and D' , such that (i) D' — D is a rank-1 matrix of bounded norm 
and (ii) all singular values of D and D' are sufficiently large, then multiplying either D or D' 
with a vector of iid normal Gaussians yields two statistically close distributions in the sense of 
differential privacy. Furthermore, a small, deterministic and public alteration of the input is 
enough to assert that all singular values of D are large. 

We apply the Johnson-Lindenstrauss transform to the task of approximating cut-queries: 
the number of edges crossing a (5, 5')-cut in a graph. We show that the JL transform allows 
us to publish a sanitized graph that preserves edge differential privacy (where two graphs are 
neighbors if they differ on a single edge) while adding only 0(|S'|/e) random noise to any 
given query (w.h.p). Comparing the additive noise of our algorithm to existing algorithms for 
answering cut-queries in a differentially private manner, we outperform all others on small cuts 
{\S\=o{n)). 

We also apply our technique to the task of estimating the variance of a given matrix in 
any given direction. The JL transform allows us to publish a sanitized covariance matrix that 
preserves differential privacy w.r.t bounded changes (each row in the matrix can change by at 
most a norm-1 vector) while adding random noise of magnitude independent of the size of the 
matrix (w.h.p). In contrast, existing algorithms introduce an error which depends on the matrix 
dimensions. 
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1 Introduction 



The celebrated Johnson Lindenstrauss transform |JL84j is widely used across many areas of Com- 
puter Science. A very non-exhaustive list of related applications include metric and graph embed- 
dings |Bou85t rLLR94j . computational speedups |Sar06l IVem05| . machine learning jBBVOGl ISchOOj . 
information retrieval [PRT"'"98] , nearest-neighbor search |Kle971 IIM981 IAC06j , and compressed sens- 
ing [BDDWOS] ■ This paper unveils a new application of the Johnson Lindenstrauss transform - it 
also preserves differential privacy. 

Consider a scenario in which a trusted curator gathers personal information from n individuals, 
and wishes to release statistics about these individuals to the public without compromising any 
individual's privacy. Differential privacy |DMNS06] provides a robust guarantee of privacy for such 
data releases. It guarantees that for any two neighboring databases (databases that differ on the 
details of any single individual), the curator's distributions over potential outputs are statistically 
close (see formal definition in Section [2|). By itself, preserving differential privacy isn't hard, since 
the curator's answers to users' queries can be so noisy that they obliterate any useful data stored 
in the database. Therefore, the key research question in this field is to provide tight utility and 
privacy tradeoffs. 

The most basic technique that preserves differential privacy and gives good utility guarantees is 
to add relatively small Laplace or Gaussian noise to a query's true answer. This simple technique 
lies at the core of an overwhelming majority of algorithms that preserve differential privacy. In fact, 
many differentially private algorithms follow a common outline. They take an existing algorithm 
and revise it by adding such random noise each time the algorithm operates on the sensitive data. 
Proving that the revised algorithm preserves differential privacy is almost immediate, because 
differential privacy is composable. On the other hand, providing good bounds on the revised 
algorithm's utility follows from bounding the overall noise added to the algorithm, which is often 
difficult. This work takes the complementary approach. We show that an existing algorithm 
preserves differential privacy provided we slightly alter the input in a reversible way. Our analysis 
of the algorithm's utility is immediate, whereas privacy guarantees require a non-trivial proof. 

We prove that by multiplying a given database with a vector of iid normal Gaussians, we can 
output the result while preserving differential privacy (assuming the database has certain properties, 
see "our technique"). This technique is no other than the Johnson-Lindenstrauss transform, and 
it's guaranteed to preserve w.h.p the L2 norm of the given database up to a small multiplicative 
factor. Therefore, whenever answers to users' queries can be formalized as the length of the product 
between the given database and a query-vector, utility bounds are straight-forward. 

For example, consider the case where our input is composed of n points in M.'^ given as a n x d 
matrix. We define two matrices as neighbors if they differ on a single row and the norm of the 
difference is at most 10 Under this notion of neighbors, a simple privacy preserving mechanism 
allows us to output the mean of the rows in A, but what about the covariance matrix A^A? We 
prove that the JL transform gives a (e, 5)-differentially private algorithm that outputs a sanitized 
covariance matrix. Furthermore, for directional variance queries, where users give a unit-length 
vector X and wish to know the variance of A along x (see definition in Section [2]) , we give utility 
bounds that are independent of d and n. In contrast, all other differentially private algorithms that 
answer directional variance queries have utility guarantees that depend on d or n. Observe that 

^This notion of neighboring inputs, also considered in [MM09l[HR12] . is somewhat different than the typical notion 
of privacy, allowing any individual to change her attributes arbitrarily. 
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our utility guarantees are somewhat weaker than usual. Recall that the JL lemma guarantees that 
w.h.p lengths are preserved up to a small multiplicative error, so for each query our algorithm's 
estimation has w.h.p small multiplicative error and additional additive error. 

A special case of directional variance queries is cut-queries of a graph. Suppose our database is a 
graph G and users wish to know how many edges cross a (S, S)-cut. Such a query can be formalized 
by the length of the product Eq\s) where Eq is the edge-matrix of G and I5 is the indicator vector 
of S (see Section [2]) . We prove that the JL transform allows us to publish a perturbed Laplacian 
of G while preserving (e, (5)-differential privacy, w.r.t two graphs being neighbors if they differ only 
on a single edge. Comparing our algorithm to existing algorithms, we show that we add (w.h.p) 
Od^l) random noise to the true answer (alternatively: w.h.p we add only constant noise to the 

query ^ ). In contrast, all other algorithms add noise proportional to the number of vertices 

(or edges) in the graph. 

Our technique. It is best to demonstrate our technique on a toy example. Assume D is a database 
represented as a {0, Ij'^-vector, and suppose we sample a vector y of n iid normal Gaussians and 
publish X = Y^D. Our output is therefore distributed like a Gaussian random variable of mean 
and variance cr^ = Assume a single entry in D changes from to 1 and denote the new 

database as D' . Then X' = Y^D' is distributed like a Gaussian of 0-mean and variance A2 = \\Df + 
1. Comparing PDFx{x) = (27rcr2)-i/2 exp(-xV(2(j2)) to PDFx'(a^) = (27rA2)-V2 exp(-xV(2A2)) 
we have that Vx, ^yW/a^PDF x' {x) > PDFx(x) > exp(— ^ • ^)PDFx'(x). Using concentration 
bounds on Gaussians we deduce that if > = 0(log(l/5)/e), then w.p > 1 — 6 both PDFs are 
within multiplicative factor of e^*^. We now repeat this process r times (setting e,6 accordingly) 
s.t. the JL lemma assures that (after scaling) w.h.p we output a vector of norm (1 ± r7)||Z)|p 
for a given r/. We get utility guarantees for publishing the number of ones in D while preserving 
(e, (5)-differential privacy. 

Keeping with our toy example, one step remains ~ to convert the above analysis so that it will 
hold for any database, and not only databases with w '= log(l/(5)/e many ones. One way is to 
append the data with w one entries, but observe: this ends up in outputting X + N where N is 
random Gaussian noise! In other word, appending the data with ones makes the above technique 
worse (noisier) than the classical technique of adding random Gaussian noise. Instead, what we do 
is to "translate the database". We apply a simple deterministic affine transformation s.t. D turns 
into a {y^, Ij'^-vector. Applying the JL algorithm to the translated database, we output a vector 
whose norm squared is ~ (1 it 77)(||L'|p + 1/;). Clearly, users can subtract w from the result, and we 
end up with rjw additive random noise (in addition to the multiplicative noise) H 

It is tempting to think the above analysis suffices to show that privacy is also preserved in the 
multidimensional case. After all, if we multiply the edge matrix of a graph G with a vector of iid 
normal Gaussians, we get a vector with each entry distributed like a Gaussian; and if we replace G 
with a neighboring G' , we affect only two entries in this vector. Presumably, applying the previous 
analysis to both entries suffices to prove we preserve differential privacy. But this intuition is false. 
Multiplying Eq with a random vector does not result in n independent Gaussians, but rather in 
one multivariate Gaussian. This is best illustrated with an example. Suppose G is a graph and S 
is a subset of nodes s.t. no edge crosses the (5, 5)-cut. Therefore EqIs is the zero-vector, and no 

^Observe that in this toy example, our 0(log(l/(5)/e) noise bound is still worse than the noise bound of 
0(-\/log(l/(5)/e) one gets from adding Gaussian noise. However, in the applications detailed in Sections [3] and |4l 
the idea of changing the input will be the key ingredient in getting noise bounds that are independent of n and d. 
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matter what random projection we pick, Y^EqIs = 0. In contrast, by adding a single edge that 
crosses the {S, S^)-cut, we get a graph G' s.t. Pr[Y^ EqiIs / 0] = 1. 

Organization. Next we detail related work. Section [2] details important notations and important 
preliminaries. In Sections [3] and H] we convert the above univariate intuition to the multivariate 
Gaussian case. Section [3] describes our results for graphs and cut-queries, and in Section 13.21 we 
compare our method to other algorithms. Section [3] details the result for directional queries (the 
general case), then a comparison with other algorithms. Even though there are clear similarities 
between the analyses in Sections [3] and U we provide both because the graph case is simpler and 
analogous to the univariate Gaussian case. Suppose G and G' are two graphs without and with 
a certain edge resp., then G induces the multivariate Gaussian with the "smaller" variance, and 
G' induces the multivariate Gaussian with the "larger" variance. In contrast, in the general case 
there's no notion of "smaller" and "larger" variances. Also, the noise bound in the general case is 
larger than the one for the graph case, and the theorems our analysis relies on are more esoteric. 
Section [5] concludes with a discussion and open problems. 

1.1 Related Work 

Differential privacy was developed through a series of papers |DNn3[ [PMNSMl IGDM+ OBj IBDMNOS j . 
Dwork et al |DMNS06] gave the first formal definition and the description of the basic Laplace 
mechanism. Its Gaussian equivalent was defined in [DKM"'"d6] . Other mechanisms for preserving 
differential privacy include the Exponential Mechanism of McSherry and Talwar [MT071 IBLROSj ; 
the recent Multiplicative Weights mechanism of Hardt and Rothblum [HRlOj and its various ex- 
tensions [HLMlOt iGHRUlll [GRU12j : the Median Mechanism [RRlOj and a boosting mechanism 
of Dwork et al |DRV10j . In addition, the classical Randomized Response (see |War65j ) preserves 
differential privacy as discussed in recent surveys [DSlOt IDwollj . The task of preserving differential 
privacy when the given database is a graph or a social network was studied by Hay et al |HLMJ09j 
who presented a privacy preserving algorithm for publishing the degree distribution in a graph. 
They also introduce multiple notions of neighboring graphs, one of which is for the change of a 
single edge. Nissim et al |NRS07j (see full version) studied the case of estimating the number of 
triangles in a graph, and Karwa et al [KRSYll] extended this result to other graph structures. 
Gupta et al |(;RTJ12j studied the case of answering (5, r)-cut queries, for two disjoint subsets of 
nodes S and T. All latter works use the same notion of neighboring graphs as we do. In differential 
privacy it is common to think of a database as a matrix, but seldom one gives utility guarantees 
for queries regarding global properties of the input matrix. Blum et al |BDMN05] approximate 
the input matrix with the PGA construction by adding 0{d?) noise to the input. The work of 
McSherry and Mironov |MM09j (inspired by the Netflix prize competition) defines neighboring 
databases as a change in a single entry, and introduces 0{k'^) noise while outputting a rank-Zc ap- 
proximation of the input. The work of Hardt and Roth [HR12j gives a low-rank approximation of 
a given input matrix while adding min{\/d, y/n} noise by following the elegant framework of Halko 
et al |HMTll] . According to |HR12j . a recent and not-yet-published work of Kapralov, McSherry 
and Talwar preserves rank-1 approximations of a given PSD matrix with error 0(n). 

The body of work on the JL transform is by now so extensive that only a book may survey 
it properly |Vem05j . In the context of differential privacy, the JL lemma has been used to reduce 
dimensionality of an input prior to adding noise or other forms of privacy preservation. Blum et 
al |BLR08] gave an algorithm that outputs a sanitized dataset for learning large-margin classifiers 
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by appealing to JL related results of |BBV06| . Hardt and Roth |HR12j gave a privacy preserving 
version of an algorithm of [HMTllj that uses randomize projections onto the image space of a given 
matrix. Blum and Roth [?] used it to reduce the noise added to answering sparse queries. The way 
the JL lemma was applied in these works is very different than the way we use it. 

2 Basic Definitions, Preliminaries and Notations 

Privacy and utility. In this work, we deal with two types of inputs: [0, l]-weighted graphs over n 
nodes and nx d real matrices. (We treat Wa^b = as no edge between a and b). Trivially extending 
the definition in |NRS071 IKRSYll] , two weighted n-nodes graphs G and G' are called neighbors 
if they differ on the weight of a single edge {a,b). Like in |HR12j . two n x d-matrices are called 
neighbors if all the coordinates on which A and A' differ lie on a single row i, s.t. || — A'^-^ |p < 1, 
where A^^^ denotes the i-th row of A. 

Definition 2.1. An algorithm ALG which maps inputs into some range TZ maintains (e, 5)-differential 
privacy if for all pairs of neighboring inputs X,Z' and for all subsets S <ZlZ it holds that 

Pr[ALG(X) G 5] < e^Pr[ALG(X') G cS] + 5 

For each type of input we are interested in answering a different type of query. For graphs, we are 
interesting in cut-queries: given a nonempty subset S of the vertices of the graph, we wish to know 
what is the total weight of edges crossing the (5, 5)-cut. We denote this as ^g{S) = Ylw^Svi^s '^u,v 

Definition 2.2. We say an algorithm ALG gives a {rj^r^v)- approximation for cut queries, if for 
every nonempty S it holds that 

Pr [(1 - n)^G{S) -T< ALG(S) < (1 + v)^g{S) + t] > I - u 

For nxd matrices, we are interested in directional variance queries: given a unit-length direction 
X, we wish to know what's the variance of A along the x direction: ^Ai-c) = x^A^Ax. (Our 
algorithm normalizes A s.t. the mean of its n rows is 0.) 

Definition 2.3. We say an algorithm ALG gives a {r],T,i')- approximation for directional variance 
queries, if for every unit-length vector x it holds that 

Pr [(1 - r])^A{x) -T< ALG{x) < (1 + v)'^a{x) + t] > I - u 

Some Linear Algebra. Given a m x n matrix M its Singular Value Decomposition (SVD) is 
M = UT,V'^ where U G i^»"x™ and V G M"^" are unitary matrices, and S has non-zero values only 
on its main diagonal. Furthermore, there are exactly rank{M) positive values on the main diagonal, 
denoted o"i(M) > . . . > (yrank{M){^)^ called the singular values. This allows us to write M as the 
sum of rank{M) rank-1 matrices: M = ^^"''^^^ aiUivJ. Because S has non-zero values only on its 
main diagonal, the notation denotes a matrix whose non-zero values lie only on the main diagonal 
and are a\{M),a2{M), . . . i crlank{M)^^^ ■ Using the SVD, it is clear that if M is of full-rank, then 
= and that if n = m = rank{M) then det(M) = HILi ^ii-^^)- Furthermore, even 

when M is not full-rank, the SVD allows us to use similar notation to denote the generalizations 
of the inverse and of the determinant: The Moore-Penrose inverse of M is M'^ = VT,~^U^; and 
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the pseudo-determinant of M is det(M) = f^™"'^^^^ (Tj(M ). An x n symmetric matrix is called 
positive semidefinite (PSD) if it holds that x^Mx > for every x S M". Given two PSDs M and 
N we denote the fact that {N - M) is PSD by M ^ A^. For further details, see |HJ90j . 

Gaussian distribution. Given a r.v. X, we denote by X ~ M{fi,a'^) the fact that X has normal 
distribution with mean /x and variance cr^. Recall that PDfx{x) = , ^ , exp(— (a; — /i)^/2iT^). 
We repeatedly apply the linear combination rule: for any two i.i.d normal random variables s.t. 
X ~ J\f{iJ.x,<yx) and y ~ AA(/xy,(Ty), we have that their linear combination Z = aX + bY is 
distributed according to Z ~ M{a^x + 6/iy,a^(T^ + 5^(Ty). This in turn allows us to identify 
a random variable R ~ AA(0, cr^) with the random variable aR' , where R! ~ A/'(0, 1). Classic 
concentration bounds on Gaussians give that Pr[|x — > log(l/5)(T^] < 25. 

The multivariate normal distribution is the multi-dimension extension of the univariate normal 
distribution. X ~ S) denotes a m-dimensional multivariate r.v. whose mean is /i S ffi"*, and 
variance is the PSD matrix S = E [(X — — ^Y]- If S has full rank (E is positive definite) 
then PDFv(a^) = , ^ exp(— ^x^S~^3;), a well defined function. If S has non-trivial kernel 

' y(27r)'"det(S) 2 h 

space then PDFx is technically undefined (since X is defined only on a subspace of volume 0, yet 
Jjg„ V\^Vx{x)dx = 1). However, if we restrict ourselves only to the subspace V = {Ker{Tj))-^ , then 
PDF^ is defined over V and PDf\{x) = , \ , exp(— ^x'^S'l'x). From now on, we omit 

the superscript from the PDF and refer to the above function as the PDF of X. Observe that using 
the SVD, we can denote S = [/ diag(o"f , (t|, . . . , o-,^, 0, . . . , 0) J7^, and so V is the subspace spanned 
by the first r rows of U . The multivariate extension of the linear combination rule is as follows. If A 
is a n X ?n matrix, then the multivariate r.v. Y = AX is distributed as though Y ~ M^Af^t, AT,A'^). 
For further details regarding multivariate Gaussians see |Mil64j . 

Finally, we conclude these Gaussian preliminaries with the famous Johnson-Lindenstrauss Lemma, 
our main tool in this paper. 

Theorem 2.4 (The Johnson Lindenstrauss transform |JL84j ). Fix any < t] < 1/2. Let M be a 

r X m matrix whose entries are iid samples from M{0, 1). Then Vx G M™. 



Pr 



M 



{l-v)\\xf<-\\Mxf<{l + v)\\xf 
r 



> 1 - 2exp(-7?V/8) 



Laplacians and edge-matrices. An undirected weighted graph G = {V{G),E{G)) can be rep- 
resented in various ways. One representation is by the adjacency matrix A, where Ay^^y — Wu,v 
Another way is by the (2) x n edge matrix of the graph. Eg- We assume that the vertices of G are 
ordered arbitrarily, and for each pair of vertices {u,v} where u < v, there exists a row in Eg- The 
entries of Eq are 

.J.) = {^/Wu,v, if n ~G and x = u ; -^/uh^, Hu^gv and x = v ] 0, o/w} 

where u ~g v denotes that (n, v) is an edge in G. Alternatively, one can represent G using the 
Laplacian of the graph Lg = EqEg- Formally, the matrix Lg is the matrix whose diagonal 
entries are {Lg)u,u = diagonal entries are {Lg)u,v = —Wu,v It is simple 

to verify that for any x, the following equality holds: x^Lgx = Ylur^c^^'^'^^^^ ~ ^ 
corollary, if we take any nonempty S C V{G) and denote its {0, Ij^-indicator vector as I5, then 
15 Lg Is = WEgIsW = Eues^s^n,. = ^g{S). 
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Additional notations. We denote by the indicator vector of a. We denote by Ca^b = — &b- 
It fohows that the n x n matrix La^b = ^a,b^\ b matrix whose projection over coordinates 

a, 6 is ^ , while every other entry is 0. We also denote Ea^b as the (2) x n matrix, 

whose rows are all zeros except for the row indexed by the (a, 6) pair, which is e^^. Observe: 

La,b = ea,bel^b = K,b^a,b- 

3 Publishing a Perturbed Laplacian 
3.1 The Johnson-Lindenstrauss Algorithm 

We now show that the Johnson Lindenstrauss transform preserves differential privacy. We first 
detail our algorithm, then analyze it. 



Algorithm 1: Outputting the Laplacian of a Graph while Preserving Differential Privacy 



Input: A n-node graph G, parameters: e,5,r],u > 
Output: A Laplacian of a graph L 

1 Set r = ^ifM, and w = V^^^ ln(4r/<5) 

2 For every u ^ v, set Wu,v ^ f + (l - f ) 'Wu,v 

3 Pick a matrix M of size r x (2), whose entries are iid samples of ^"(0, 1) 

4 return L = IE}.IvPMEg 



Algorithm 2: Approximating ^g{S) 



Input: A non empty S C V[G), parameters n, w and Laplacian L from Algorithm [H 
return R{S) = i^Lls - w'-^^' 



Theorem 3.1. Algorithm[l\ preserves {e, 6) -differential privacy w.r.t to edge changes in G. 

Theorem 3.2. For every 77, > and a nonempty S of size s, Algorithm \M gives a (77, r, i/)- 
approximation for cut queries, for t = Oi^s • *"(^/'^) ^^(1/5) + ln(ln(l/i/)/r/^)) ) . 

Clearly, once Algorithm [T] publishes L, any user interested in estimating ^g{S) for some 
nonempty S C V{G) can run Algorithm [2] on her own. Also, observe that w is independent of 
n, which we think of as large number, so we assume thoughout the proofs of both theorems that 
both ^, ^ are < 1/2. Now, the proof of Theorem 13.21 is immediate from the JL Lemma. 

Proof of Theorem \3.2l Let us denote G as the input graph for Algorithm [H and H as the graph 
resulting from the changes in edge- weights Algorithm [T] makes. Therefore, 

Lh = i^^K^ + -/^(i_2£)G = —^K-a + I 1 ] 

Fix S. The JL Lemma (Theorem 12. 4p assures us that w.p. > 1 — we have 

(1 - r])llLHls < Ipls < (1 + V^sLhIs 
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The proof now follows from basic arithmetic and the value of w. 



n V / 



I 'I + n)-sin - s) + (1 + 7?)(1 - -)1}Lg1s -w ' 



l_w y " n ^ ' ^ n' " n 

n V 

< (1 + ri)^G{S) + Y^'nw ■s = {l + ri)^G{S) + T 

n 

where r < 2i^w ■ s. The lower bound is obtained exactly the same way. □ 

Comment. The guarantee of Theorem 13.21 is not to be mistaken with a weaker guarantee of 
providing a good approximation to most cut-queries. Theorem 13.21 guarantees that any set of k 
predetermined cuts is well-approximated by Algorithm [U assuming Algorithm [T] sets v < l/2k. 
In contrast, giving a good approximation to most cuts can be done by a very simple (and privacy 
preserving) algorithm: by outputting the number of edges in the graph (with small Laplacian noise). 
Afterall, we expect a cut to have -p^s{n — s) edges crossing it. 

We turn our attention to the proof of Theorem 13.11 We fix any two graphs G and G' , which 
differ only on a single edge, (a, b). We think of (a, b) as an edge in G' which isn't present in G, and 
in the proof of Theorem 13. H we identify G with the manipulation Algorithm [T] performs over G, 
and assume that the edge (a, b) is present in both graphs, only it has weight ^ in G, and weight 1 
in G' . Clearly, this analysis carries on for a smaller change, when the edge (a, b) is present in both 
graphs but with different weights. (Recall, we assume all edge weights are bounded by 1.) 

Now, the proof follows from assuming that Algorithm [1] outputs the matrix O = MEq, instead 
of L = ^0^0. (Clearly, outputting O allows one to reconstruct L.) Observe that O is composed of 
r identically distributed rows: each row is created by sampling a (2) -dimensional vector Y whose 
entries ~ AA(0, 1), then outputting Y^Eq. Therefore, we prove Theorem 13. II bv showing that each 
row maintain (eo, (5o)-differential privacy, for the right parameters eo, Sq. To match standard notion, 
we transpose row vectors to column vectors, and compare the distributions EqY and Eq,Y. 

Claim 3.3. Set eo = ^=%=,5o = ^. Then, 

Vx, PDF^Ty(x) < e'"PDF^T^y(x) (1) 
Denote S = {x: PDF^Ty(x) > e'^opOF j^j^yi^)} ■ Then 

Fr[S] >l-So (2) 

Proof of Theorem based on Claim Apply the composition theorem of |DRV10j for r iid 
samples each preserving (eo, (5o)-differential privacy. □ 

To prove Claim [3^ we denote X = EqY and X' = Eq,Y. From the preliminaries it follows 
that X is a multivariate Gaussian distributed according to A/'(0, E'^/^n-j^^n-jSc?) = AA(0, Lg), and 
similarly, X' ~ M{0,Lq/). In order to analyze the two distributions, M{0,Lg) and A/'(0,Lg/), we 
now discuss several of the properties of Lq and Lqi , then turn to the proof of Claim 13.31 

First, it is clear from definition that the all ones vector, 1, belongs to the kernel space of Eq 
and Eqi, and therefore to the kernel space of Lq and Lqi. Next, we establish a simple fact. 
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Fact 3.4. If G is a graph s.t. for every u ^ v we have that Wu,v > 0, then 1 is the only vector in 
the kernel space of Eg and Lq ■ 

Proof. Any non-zero x _L 1 has at least one positive coordinate and one negative coordinate, thus 
the non-negative sum HE'cxp = Lqx = Yluftv'^u,v{xu — x^)'^ is strictly positive. □ 

Therefore, the kernel space of both Lq and of Lq/ is exactly the 1-dimensional span of the 1 
vector (for every possible outcome y of y we have that EJ^y ■ 1 = E^/y 1 = 0). Alternatively, both 
X and X' have support which is exactly V = l"*". Hence, we only need to prove the inequalities of 
Claim [331 for x G V. Secondly, observe that Lqi = Lq + (1 — ^)La^b- Therefore, it holds that for 
every x S we have x^Lqix = x^Lqx + (1 — ^){xa — Xj,)'^ > x^Lgx. In other words, Lg ^ Lg'^ 
a fact that yields several important corollaries. 

We now introduce notation for the Singular Value Decomposition of both Lg and Lg'- We 
denote E}. = UT.V^ and Eg^'' = U'AV'^, resulting in Lg = UY?W , Lg' = U'A'^U'^, L^ = UT.-'^W 
and L^, = U'A'^U''^. We denote the singular values of Lg as > . . . > cr^_]^ > fi^ = 0, and the 
singular values of Lg' as > . . . > X^-i ^ — ^- Weyl's inequality allows us to deduce the 
following fact. Its and other facts' proofs are in Appendix 1X1 

Fact 3.5. Since Lg ^ Lg' then for every i we have that Xf'>crf. 

In addition, since Algorithm [Halters the input graphs s.t. the complete graph ^Lx^ is contained 
in G, then it also holds that ^Lk„ ^ Lg, and so Fact 13.51 gives that for every l<z<?i — Iwe 
have that erf > w = ^ ■ n. (It is simple to see that the eigenvalues of Kn are {n, n, . . . ,n,0}.) 
Furthermore, as Lg" = Lg + (1 — ^)La,b and the singular values of La^b are {2, 0,0,..., 0}, then we 
have that 

J2 Af = tr{LG') < tr{LG) + tr ((1 - -)L„,,) <Y.af + 2 



Another fact we can deduce from Lg ^ Lg', is the following. 



Fact 3.6. Since the kernels of Lg and of Lg' are identical, then for every x it holds that x^ Lq,x < 



x^LqX. Symbolically, Lg ^ Lg' =^ L^^, ^ l'q. 



Having established the above facts, we can turn to the proof of privacy. 

Proof of Claim [373[ We first prove the upper bound in ([T]). As mentioned, we focus only on x E 
V = l"*", where 

PDF^Ty(x) = ((27r)'^-Met(LG))~'^%xp(-^x^4x) 
PDF^T^y(x) = ((27r)"-Met(LG")) ^^%xp(-ix^L^,x) 

As noted above, we have that for every x it holds that x^lLx < x^Lt,x, so exp(— ^x^Lt^x) < 
Denoting Aj = Xj — af > 0, and recalling that Y^- Aj < 2 and that \/i,af > w it holds that 



PDF^Ty(x) ^ 
PDF^T,y(x) - 



,77^1 + < exp ( — V Ai I < < eV4'-in(2/i) 
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We now turn to the lower bound of ([2]). We start with analyzing the term x^L^x that appears 
in PDF^Ty(x). Again, we emphasize that x G V, justifying the very first equality below. 

X^L^qX = X^ L^qLg'L^qiX = L^Q (^Lg + (1 — —)Labj L^QiX 

— x^ L^Q/X -j- (1 — — ^x^ Ij^qL(iI)Ij^qiX 

= x^L^Q/X + (1 — —)x^ L^r^Ca^b ■ ;,L^,a; 



Therefore, if we show that 



< 5n 



(3) 



then it holds that w.p. > 1 — 5q we have 



PDF 



> 1 • exp ( —-x^{L^Q — L^Qi)x ) > exp 



X L^G^O'fi 



which proves the lower bound of ([2]). We turn to proving ([3]). 

Denote termi = c^j^LqX and term2 = e^^Lj^,x. Since x = EqIj where y ^ Y then terrrii is 
distributed like fecjy where veci = EGL^eafi and vec2 = EGL^iea^b- The naive bound, ||i'eci|| < 
\\Eg\\ ||L^|| llca^bll gives a bound on the size of veci which is dependent on the ratio -P^. We can 
improve the bound, on both ||weci|| and ||fec2||, using the SVD of Eg and Eg'- 



\veci 



\EGLlea,b\\ = \\V^U^U^-^U^ea,b\\ = \\V^-^U^ea,b\ 

-1 ^ V2 

KMI = 1 ■ ^n-l • 1 • V 2 = 



< IIFII \\U\ 



w 



VeC2\\ = \\EGLl.,ea,b\\ = \\{Eg' - (1 - ■^)£^a,fe)^G'^a,fell < II -^G' ^G'«^«,6 II + ll^a.feic'^a.fi 



(*) /— f (**) \/2 -p + 

< K-1 ■ V2 + \\EafiL'(j,ea,b\\ = —r= + ea,b^G"^a-b 



/w w 



where the bound in (*) is derived just like in veci (using Ec'L^G'^o^b = V'AU'''^U'A "^U'^ea^b) , and 
the equality in (**) follows from the fact that all coordinates in the vector Ea^bE^G'^a.,b are zero, 
except for the coordinate indexed by the (a, h) pair. 

We now use the fact that termi and term2 are both linear combinations of i.i.d A/'(0, 1) 
random variables. Therefore for i = 1,2 we have that termi ~ A/'(0, ||t;eci|p) so Pr[|termj| > 



y%g(2/6o)\\vec^\\] < e 



< 



So 



. It follows that w.p > 1—5q both \ternii\ < Y^log(2/(5o 



and \term2\ < y^\og{2/ 5q)\J ^, so termi ■ term2 < ■v/81og(2/5o)/i«. Plugging in the value of w, we 
have that Pr [termi • term2 < 2eo] > 1 — (^o which concludes the proof of ([3]) and of Claim [3^ □ 
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3.2 Discussion and Comparison with Other Algorithms 



Recently, Gupta et al |GRU12) have also considered the problem of answering cut-queries while pre- 
serving differential privacy, examining both an iterative database construction approach (e.g., based 
on the multiplicative-weights method) and a randomized-response approach. Here, we compare this 
and other methods to our algorithm. We compare them along several axes: the dependence on n 
and s (number of vertices in G and in S resp.), the dependence on e, and the dependence on A: - the 
number of queries answered by the mechanism. Other parameters are omitted. The bottom line 
is that for a long non-adaptive query sequence, our approach dominates in the case that s = o{n). 
The results are summarized in Table [TJ 

Note, comparing the dependence on k for interactive and non-interactive mechanisms is not 
straight-forward. In general, non-interactive mechanisms are more desirable than interactive mech- 
anisms, because interactive mechanisms require a central authority that serves as the only way 
users can interact with the database. However, interactive mechanisms can answer k adaptively 
chosen queries. In order for non-interactive mechanisms to do so, they have to answer correctly 
on min{exp(0(A;)), 2"} queries. This is why outputting a sanitized database is often considered a 
harder task than interactively answering user queries. We therefore compare answering k adap- 
tively chosen queries for interactive mechanisms, and k predetermined queries for non-interactive 
mechanism. 

3.2.1 Our Algorithm 

Clearly, our algorithm is non-interactive. As such, if we wish to answer correctly w.h.p. a set of k 
predetermined queries, we set ly' = v/k, and deduce that the amount of noise added to each query is 
0{s-\J\og{k) / e). So, if we wish to answer all 2" cut queries correctly, our noise is set to 0{s^yn/e). 
An interesting observation is that in such a case we aim to answer all 2" queries, we generate a iid 
normal matrix of size r x n where r > n. Therefore, we now apply the JL transform to increase 
the dimensionality of the problem rather than decreasing it. This clearly sets privacy preserving 
apart from all other applications of the JL transform. 

In addition, we comment that our algorithm can be implemented in a distributed fashion, where 
node i repeats the following procedure r times (where r is the number of rows in the matrix picked 
by Algorithm [T|) : First, i picks n — i — 1 iid samples from AA(0, 1) and sends the j-th sample, xj, to 
node i + j- Once node i receives i — 1 values from nodes 1, 2, . . . , i — 1, it outputs the weighted sum 
Ejyi(-l)^^^'^^j {y^ + Wi,j{l - yf)) (where (-1){J<^} denotes -1 if j < i, or 1 otherwise). 

3.2.2 Naively Adding Laplace Noise 

The most basic of all differentially private mechanisms is the classical Laplace mechanism which is 
interactive. A user poses a cut-query S and the mechanism replies with ^g{S) + Lap{0, e~^) (since 
the global sensitivity of cut-queries is 1). The composition theorem of jPRVlOj assures us that for 
k queries we preserve {0{Vke),S) -privacy. As a result, the mechanism completely obfuscates the 
true answer if A: > and even for k = n"^ has noise proportional to n/e. 

3.2.3 The Randomized Response Mechanism 

The "Randomized Response" algorithm perturbs the edges of a graph in a way that allows us 
to publish the result and still preserve privacy. Given G, the Randomized Response algorithm 
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constructs a weighted graph H where for every u,v £ V{G), the weight of the edge {u,v) in H, 
denoted w'^ is chosen independently to be either 1 or —1. Each edge picks its weight independently, 



1 



l+eWn,v 



and Prlw'., 



s.t. Pr[w'^^^ — — 2 "^^^ ^ •-i'^u,v — — T 
differential edge privacy: two neighboring graphs differ on a single edge, (a, 6), and obviously 



-1] 



l — eWu,v 



Clearly, this algorithm maintains e- 



Wa,b 



l]<(l + 6)PrK, = l I = 0] 



In addition, it is also evident that for every nonempty 5 C V{G), we have that E[^^g 
^ Sues 1.^5 ""^M.t' — ^^g{S), yet the variance of this r.v. is il{s{n — s)). Therefore, a classical 
Hoeffding-type bound gives that for any nonempty S C V{G) we have that for every < < 1/2, 



Pr 



ues.ves 



> 



V21og(l/z.)s(n 



< 2u 



Observe that while ^/s{ 
cuts with s 



n 



s) is a comparable with s when s = r2(n), there are cuts (namely, 
^— ^ = Vt{y/n). More generally, the additive noise of Randomized 



0(1)) where 

Response is a factor \/n/s worse than our algorithm. We comment that the Randomized Response 
algorithm can also be performed in a distributed fashion, and in contrast to our algorithm, it 
has no multiplicative error. In addition, the above analysis holds for any linear combination of 
edge, not just the s(n — s) potential edges that cross the (5, 5) cut. So given E' C E{G) it is 

possible to approximate XlesS' j- "v/l-^ I i°g(i/'^) ^ > 1 — 2v. In particular, for queries 

regarding an (S", r)-cut (where S,T are two disjoint subsets of vertices) we can estimate the error 

V|5||T|log(l/^) 



up to ib- 



- . We also comment that the version of Randomized Response presented here 



differs slightly from the version of |GRU12j . In particular, it is possible to address their concern 
regarding outputting a sanitized graph with non-negative weights by an affine transformation taking 
{-1,1} ^{0,1}. 



3.2.4 Exponential Mechanism / BLR 

The exponential mechanism [MTOTj IBLROSj is a non-interactive privacy preserving mechanism, 
which is typically intractable. To implement it for cut-queries one needs to (a) specify a range of 
potential outputs and (b) give a scoring function over potential outputs s.t. a good output's score 
is much higher than all bad outputs' scores. 

One such set of potential outputs is derived from edge-sparsifiers. Given a graph G we say that 
H is an edge-sparsifier for G if for any nonempty S C V{G) it holds that ^>h(5) G (1 ± ?7)<I>g(5). 
There's a rich literature on sparsifiers (see |BK961 IST041 ISSOSj ). and the current best known 
construction |BSS09j gives a (weighted) sparsifier with 0{n/rf') edges with all edge- weights < 
poly(n). By describing every edge's two endpoints and weight, we have that such edge-sprasifiers 
can be described using 0(nlog(n)) bits (omitting dependence on rf). Thus, the set of all sparsifiers 
is bounded above by exp(0(7T, log(n))). Given an input graph G and a weighted graph we can 
score H using q{G,H) = max^ {min^. |q,_i|<^ |$ij(5)/a — <I>g'(<S')|}. Observe that if we change G 
to a neighboring graph G' , then the score changes by at most 1. 

Putting it all together, we have that given input G the exponential mechanism gives a score of 
^-<^q{G,H)/2 gg^g]^ possible output. The edge-sparsifier of G gets score of 1, whereas every graph 
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with q{G, H) > t gets a score of So if we wish to claim we output a graph whose error is 

> r w.p. at most v, then we need to set exp(nlog(n) — eT/2) < u. It follows that r is proportional 
to nlog(n)/e. Note however that the additive error of this mechanism is independent of the number 
of queries it answers correctly. 

We comment that even though we managed to find a range of size 2'^("'°s(")), it is possible 
to show that the range of the mechanism has to be 2^^"^ (Fix a < 1/2 and think of a set of 
inputs G where each G G Q has n/2 vertices with degree and n/2 vertices with degree n^". 
Preserving all cuts of size 1 up to (1 it t]) requires our output to have vertices of degree > (1 — 7])n?°' 
and vertices of degree < (1 + rj)n'^. Therefore, by representing vertices of high- and low-degree 
using a binary vector, there exists an injective mapping of balanced {0, l}"-vectors onto the set 
of potential outputs.) Thus, unless one can devise a scoring function of lower sensitivity, the 
exponential mechanism is bounded to have additive error proportional to n/e. 

3.2.5 The Multiplicative Weights Mechanism 

The very elegant Multiplicative Weights mechanism of Hardt and Rothblum [HRlOj can be adapted 
as well for answering cut queries. In the Multiplicative Weights mechanism, a database is repre- 
sented by a histogram over all "types" of individuals that exist in a certain universe. In our 
case, each pair of vertices is a type, and each entry in the database is an edge detailing its weight. 
Thus, N = (2) and the database length = l-ElH and each query S corresponds to taking a dot- 
product between this histogram the (2) -length binary vector indicating the edges that cross the 
cut. Plugging these parameters into the main theorem of [HRlOj . we get an adaptive mechanism 
that answers k queries with additive noise of 0{^y\E\log{k)/e). 

We should mention that the Multiplicative Weights mechanism, in contrast to ours, always 
answers correctly with no multiplicative error and can deal with k adaptively chosen queries. Fur- 
thermore, it allows one to answer any linear query on the edges, not just cut-queries and in particular 
answer (5, T)-cut queries. However, its additive error is bigger than ours, and should we choose 
to set k = 2"' (meaning, answering all cut-queries) then its additive error becomes 0{ny^\E\/e) (in 
contrast to our 0{s^/n/e)). 

Gupta et al jGRU12j have improved on the bounds on the Multiplicative Weights mechanism by 
generalizing it as a "Iterative Database Construction" mechanism, and providing a tighter analysis 
of it. In particular, they have reduced the dependency on e to l/\/e. Overall, their additive error 
is 0{y^\E\ log{k)/^/e), which for the case of all cut-queries is 0{^/n\E\/e). 

4 Publishing a Covariance Matrix 
4.1 The Algorithm 

In this section, we are concerned with the question of allowing users to estimate the covariance of a 
given sample data along an arbitrary direction x. We think of our input as anxd matrix A, and we 
maintain privacy w.r.t to changing the coordinates of a single row s.t. a vector v of size 1 is added 
to . We now detail our algorithm for publishing the covariance matrix of A. Observe that in 
addition to the variance, we can output // = ^A^l, the mean of all samples in A, in a differentially 

'^Observe that it is not possible to assume \E\ = 0{n) using sparsifiers, because sparsifiers output a weighted graph 
with edge- weights 0{n). Since the Multiphcative Weights mechanism views the database as a histogram the overall 
resolution of the problem remains roughly in the worst case. 
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Method 


Additive Error 
for any k 


Additive Error 
for all Cuts 


Multi- 
plicative 
Error? 


Inter- 
active? 


Tract- 
able? 


Comments 


Laplace Noise 
|DMNS06| 


0{Vk/e) 


0(2"/2e) 


X 


/ 


/ 




Randomized Re- 
sponse 


0(Vsnlog(fc)/e) 


0(nV5/e) 


X 


X 


/ 


Can be dis- 
tributed; answers 
(5, T)-cut queries 


Exponential 

IVIeclianisni 

IMT071IBLR08] 


0(nlog(n)/e) 


0(nlog(n)/e) 


/ 


X 


X 


Error ind. of k 


MW |HR10] 
IDC IGRU12] 


0{^\E\\og{k)/e) 
0(^\E\\og{k)/e) 


0{n^\E\/e) 
0(^n\E\/e) 


X 


/ 


/ 


Answers (S, T)- 
cut queries 


JL 


0(s^\og{k)/e) 




/ 


X 


/ 


Can be dis- 
tributed 



Table 1: Comparison between mechanisms for answering cut-queries, e - privacy parameter; n and 
\E\ - number of vertices and edges resp.; s - number of vertices in a query; k - number of queries. 



private manner by adding random Gaussian noise. (We merely output /i = fi+J\f{0, —^^r-^Idxd)-) 
We denote by Inxd the n x d matrix whose main diagonal has 1 in each coordinate and all other 
coordinates are 0. 

Algorithm 3: Outputting a Covariance Matrix while Preserving Differential Privacy 
Input: A nx d matrix A. Parameters e^5,rj,v > 0. 

1 Set r = and w = W^M^M ln(16r/<^). 

2 Subtract the mean from A by computing A <— A — -ll'^A. 

3 Compute the SVD of A = U^V^. 

4 Set A ^ Uiy^j:^ + w^Inxd)V\ 

5 Pick a matrix M of size r x n whose entries are iid samples of A^(0, 1). 

6 return C = ^A^M^MA. 



Algorithm 4: Approximating ^a{x) 
Input: A unit-length vector x, parameter w and a Covariance matrix C from Algorithm [3l 
return R{x) = x^Cx — nP' . 

Theorem 4.1. Algorithmic preserves {e, 6) -differential privacy. 

Theorem 4.2. Algorithmic is a {rjjTjv)- approximation for directional variance queries, where 

Proof of Theorem \4-^ Again, the proof is immediate from the JL Lemma, and straight-forward 
arithmetics give that for every x w.p. > 1 — we have that 

(1 - ri)^Aix) - rjw^ < R{x) < (1 + i])(^a{.x) + ijw'^ 

so r = rjvu'^. □ 
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Comment. We wish to clarify that Theorem 14.21 does not mean that we pubhsh a matrix C 
which is a low-rank approximation to A^A. It is also not a matrix on which one can compute an 
approximated PCA of A, even if we set v = l/poly((i). The matrix C should be thought of as a 
"test-matrix" - if you believe A has high directional variance along some direction x then you can 
test your hypothesis on C and (w.h.p) get the good approximated answer. However, we do not 
guarantee that the singular values of A'^A and of C are close or that the eigenvectors of A^A and 
C are comparable. (See discussion in Section O) 



Proof of Theorem \4-l\ Fix two neighboring A and A' . We often refer to the gap matrix A' — A 
as E. Observe, £^ is a rank-1 matrix, which we denote as the outer-product E = CiV^ {ci is the 
indicator vector of row i and f is a vector of norm 1). As such, the singular values of E are exactly 
{1,O,...,O}0 

The proof of the theorem is composed of two stages. The first stage is the simpler one. We 
ignore step 4 of Algorithm [3] (shifting the singular values) , and work under the premise that both 
A and A' have singular values no less than w. In the second stage we denote B and B' as the 
results of applying step 4 to j4 and A' resp., and show what adaptations are needed to make the 
proof follow through. 

Stage 1. 

We assume step 4 was not applied, and all singular values of A and A' are at least w. 

As in the proof of Theorem l3.H the proof follows from the assumption that Algorithm [3] outputs 
= A^M (which clearly allows us to reconstruct C = ^O^O). Again is composed of r columns 

each is an iid sample from A^Y where Y ~ A/'(0,/„xn)- We now give the analogous claim to 

Claim [ 



Claim 4.3. Fix eq = — = and 6o = ^. Denote S = {x : e"^oPDF4,Tv(x) < PDFa-yy{x) < 

^ ^4r ln(2/5) " 2r L A' Y\ > - A Y\ J - 

e^opDF^,Ty(x)}. Then Pr[5] >l-5o. 

Again, the composition theorem of [DRVlOj along with the choice of r gives that overall we 
preserve (e, 5)-differential privacy. □ 



Proof of Claim The proof mimics the proof of Claim [3?3l but there are two subtle differences. 
First, the problem is simpler notation-wise, because A and A' both have full rank due to Al- 
gorithm [3l Secondly, the problem becomes more complicated and requires we use some heavier 
machinery, because the singular values of A' aren't necessarily bigger than the singular values of 
A. Details follow. 

First, let us formally define the PDF of the two distributions. Again, we apply the fact that 
A'Y and A'^Y are linear transformations of A/'(0, /j^xn)' 

PDF^Ty(x) = ^ =eic^{-\x\A^A)-^x) 

^ ' V(27r)^det(^^A) 2 ^ ' ' 

1 . 1 



PDF,,Tv(x) = exp( — x'^iA^A' 

^ ' v'(2vr)'^det(A'M') ^ 2 ^ 



X] 



''For convenience, we ignore the part of the algorithm that subtracts the mean of the rows of A. Observe that if 
E = A — A' then after subtracting the mean from each row, the difference between the two matrices is ii^v where 
is simply subtracting 1/n from each coordinate of d. Since ||ei|| < ||ei||, this has no effect on the analysis. 
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Our proof proceeds as follows. First, we show 



Then we show that no matter whether we sample x from A^Y or from A'^Y, we have that 

^1 



((A^A)-i - (A'V)-i) x| > eo/2 



< So (5) 



Clearly, combining both Q and (0) proves the claim. 

Let us prove ([1]). Denote the SVD of ^ = IfEV^ and A' = U'AV'''^, where the singular values of 
A are o"i > cr2 > • • • > > and the singular values of A' are Ai > A2 > . . . > > 0. Therefore 
we have AM = A'^A' = V'A^V'^ and also (A'^A)-^ = (A'^A')"^ = V'A-^V'\ 

Thus det(AM) = nil and det{A'^A') = nti Af. 

This time, in order to bound the gap ~ '^l)/^! it isn't sufficient to use the trace of the 

matrices. Instead, we invoke an application of Lindskii's theorem (Theorem 9.4 in |Bha07) ). 

Fact 4.4 (Linskii). For every k and every 1 < ii < 12 < ■ ■ ■ < ik ^ n we have that 

j=i j=i i=i 

where {svi{E)}'^^^ are the singular values of E sorted in a descending order. 

As a corollary, because E has only 1 non-zero singular value, we denote Big = {i : Aj > cTj} 
and deduce that Yli^Big 1- Similarly, since the singular values of E and of {—E) are the 

same, we have that Yln^Big o'j ~ A, < 1. Using this, proving (j3|) is straight-forward: 



i&Big \ iaBig 



and similarly, y Hi ^ — C^"/^. 

We turn to proving ([5]). We start with the following derivation. 

x'iA'Ay^x - x''{A'^A')-^x = x''{A^A)-\j^^J^){A!^A!)-^x - x'' {A!^ A!)-^ x = 

= x''{A'A)-\[A + Ey{A + E)){A'^A')-'^x - x\A'^A')-^x 
= x''{A'Ay^{A'E + £;M')(A'^A')~^x 

and using the SVD and denoting E = CiV^ , we get 

x\A''A)-^x - x^A'^A'y^x = x^ • v'' {V'A~^V'^) x 

W (FS-V^) V ■ ej {U'A-^V'^) X 

So now, assume x is sampled from A^Y. (The case of A'^Y is symmetric. In fact, the names A 
and A' are interchangeable.) That is, assume we've sampled y from Y ~ A^(0, I^xn) and we have 
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X = A^y = VTiW^y and equivalently x = {A'~^ — E'^)y = V' MJ'~^y — vejy. The above calculation 
shows that 

— x^{A' < termi ■ term2 + terms ■ term^ 

where for i = 1, 2, 3, 4 we have terrrii = \veci ■ y\ and 

feci = UY,V~'^VY,~^Uei = e^, so ||weci|| = 1 

vec2 = U'A~W^v - av'^V'A-'^V'^v, so ||fec2|| < ^ + ^ 



fees = UT, ^V~'^v, so ||wec3|| < — 

vec4 = Ci — CiV^V' h.~^U'~^ Ci, so ||wec4|| < 1 + — 

Ad 

Recall that all singular values, both of A and A' , are greater than w and that veci-y ~ AA(0, \\veci 
so w.p. > 1 — 6o we have that for every i it holds that terrrii < Y^ln(4/(5o)||fec.j|| so 

Ix^iA-'Ay^x - x^A'^A'r^xl < 2(- + -\) ln(4/5o) < ^^''^^^'^"^ < eo 

this concludes the proof in our first stage. 
Stage 2. 

We assume step 4 was applied, and denote B = C/(VS^ + w^I)V'^ and B' = C/'(VA^ + ■w^I)V''^. 
We denote the singular values of B and B' as erf > o-f > . . . > fjf and Af > Af > . . . > Af resp. 
Observe that by definition, for every i we have {af)'^ = af + iv^ and (Af )^ = A? + w'^ . 

Again, we assume we output = B^Y, and compare X = B^Y to X' = B'^Y. The theorem 
merely requires Claim BTBl to hold, and they, in turn, depend on the following two conditions. 



1 



- V det{B^B) - ^ ' 



^ {{B-'B)-^ - {B"B'r') x\ > eo/2 



< So (7) 



The second stage deals with the problem that now, the gap A = B' — B is not necessarily a rank-1 
matrix. However, what we show is that all stages in the proof of Claim 14.31 either rely on the 
singular values or can be written as the sum of a few rank-1 matrix multiplications. 

The easier part is to claim that Eq. ([6|) holds. The analysis is a simple variation on the proof of 
Eq. (jl]). Fact 14.41 stiU holds for the singular values of A and A'. Observe that Af > af iff Aj > cij. 
And so we have 



n 



(A 



B\2 



B\2 



< 



n 

&Big 



erf + ^2 



< 



n 



A? 



and the remainder of the proof follows. 
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We now turn to proving Eq. ([7]). We start with an observation regarding A'^A and B'^ B' . 

A'^A' = {A + Ey{A + E) = A" A + Al^ E + E'' A 
B^B = + w'^I)V^ = V + u;^/ = A'' A + w'^I 

B"B' = V'{A^ + w^I)V'^ = A'^A' + w^I 
B'^B' - B^B = A'^E + E^A 

Now we can follow the same outline as in the proof of (0). Fix x, then: 

x''{B^By^x - x''{B'^B'y^x = x''{B^By\B'^B'){B'^B')-^x - x''{B'^B'r^x = 



^{B^i 

x''{B^B)-^ [B^B + A'^E + E'^A] {B'^B'y^x - x^B'^B'^^x 
x'^iB'^B)-^ [A'^E + E'^A] {B'^B'^^^^ 



X 



= x''{B^B)-\A'' + E^)e, ■ {B'^ B'^~^ 
+x\B''B)~^v ■ ej {A' - E) {B'^ B 



X 



It is straight-forward to see that the i-th spectral values of (B^B) A is 



< 



and similarly for the spectral values of {B'^ B')~^A' . We now proceed as before and partition the 
above sum into multiplications of pairs of terms where terrrii < \veci ■ y\, and y is sampled from 
AA(0,/„xn) and x = B'^y: 



x'^iB'^By'^x - x^{B'''B 



/T 



X =y^[BiB^B)-\A-' + E^)e,] ■ [v^B'^ B')'^ B^] y 
W[BiB^B)-\] . [eJ{A'-E){B''Br'B-]y 



Lastly, we need to bound all terms that contain the multiplication {B'^ B') ^B^y in comparison 
to {B'^B'y^B'^y = B'^y. For instance, take the term = {vec'yl for vec" = ej (A' - E) {B'^ B')-^B'' , 
and define it as vec^ = z^B^. We can only bound \\Bz\\ using cr^/(A^)^, whereas we can bound 
IIB'^II with 1/A^ < 1/w. In contrast to before, we do not use the fact that B^y = {B' — AYy. 
Instead, we make the following derivations. 

First, we observe that for every vector z we have that > 11^' -^ll and ||-B'2;|| > tiiUzH. 

Using the fact that B'^B — B'^ B' = —A''^E — E'^A, a simple derivation gives that HSzp < 
{\\B'z\\ + \\z\\f < {l + ^Y\\B'z\\^ and vice-versa. So if y is s.t. , i^<i > Threshold then 

^-j§^ > Threshold. Observe that z'^B^y is distributed like M{0, \\Bzf) = \\Bz\\M{0, 1), and so 
we have that for every 6' > 



Pr 



z''B^y\ > Vlog(l/<5') ( 1 + - ) \\B'^ 



1 



= Pr 
< Pr 



1 + - 1 \\B'z\ 
w 



z'B-'y\ > Vlog(l/5') 



\Bz\\r^\z''B''y\ > Vlog(l/5' 



<6' 



□ 



Corollary. Using the definitions of r and w as in Algorithm [3] - the proof of Theorem 14.11 actually 
shows that in the case that ^ is a matrix with all singular values > w, then the following simple 
algorithm preserves (e, 5)-differential privacy: pick a random r x n matrix M whose entries are iid 
normal Gaussians, and output O = MA. Furthermore, observe that if cr^, the least singular value 
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of A, is bigger than, say, Ww, then one can release ad + Lap{\/e) then release O = MA. In such a 
case, users know that for any unit vector x w.p. > 1 — i/ it holds that ^||02;|p < (1 it 

Comment. Comparing Algorithms [1] and [3l we have that in Lq = EqEq we "translate" the 
spectral values by w, and in A^A we "translated" the spectral values by w'^. This is an artifact of 
the ability to directly compare the spectal values of Lq and Lc in the first analysis, whereas in 
the second analysis we compare the spectral values of A and A' (vs. A^A and A'~'^ A'). This is why 
the noise bounds in the general case are 0{l/erj) times worse than for graphs. 

4.2 Comparison with Other Algorithms 

To the best of our knowledge, no previous work has studied the problem of preserving the variance 
of A in the same formulation as us. We deal with a scenario where users pose the directions on 
which they wish to find the variance of A. Other algorithms, that publish the PCA or a low-rank 
approximation of A without compromising privacy (see Section II. ip , provide users with specific 
directions and variances. These works are not comparable with our algorithm, as they give a 
different utility guarantee. For example, low-rank approximations aim at nullifying the projection 
of A in certain directions. 

Here, we compare our method to the Laplace mechanism, the Multiplicative Weights mechanism 
and Randomized Response. The bottom line is clear: our method allows one to answer directional 
variance queries with additive noise which is independent of the given input. Other methods require 
we add random noise that depends on the size of the matrix, assuming we answer polynomially 
many queries. 

Our notation is as follows, n denotes the number of rows in the matrix (number of individuals 
in the data), d denotes the number of columns in the matrix, and we assume each entry is at most 1. 
As before, e denotes the privacy parameter and k denotes the number of queries. Observe that we 
(again) compare k predetermined queries for non-interactive mechanisms with k adaptively chosen 
queries for interactive ones. The remaining parameters are omitted from this comparison. Results 
are summarized in Table [2j 

4.2.1 Our Algorithm 

Our algorithm's utility is computed simply by plugging in z/ = 0{l/k) to Theorem 14.21 which gives 
a utility bound of 0(log(A;)/e^). 

4.2.2 Naively Adding Laplace Noise 

Again, the simplest alternative is to answer each directional-variance query with ^x{A)+Lap{0, e~^). 
The composition theorem of [DRV 10] assures us that for k queries we preserve (0(Vfce),J)-differential 
privacy. 

4.2.3 Randomized Response 

We now consider a Randomized Response mechanism, similar to the Randomized Response mech- 
anism of |GRU12j . We wish to output a noisy version of A'^A, by adding some iid random noise to 
each entry of A'^A. Since we call two matrices neighbors if they differ only on a single row, denote 
V as the difference vector on that row. It is simple to see that by adding v to some row in A, each 
entry in A'^A can change by at most ||v||i. Recall that we require ||f||2 = 1 and so < ^/d. 
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Method 


Additive Error 


Multi- 
plicative 
Error? 


Inter- 
active? 


Tract- 
able? 


Laplace Noise 
|DMNS06j 


OiVk/e) 


X 


/ 


/ 


Randomized Re- 
sponse 


0(Vdlog(fc)A) 


X 


X 


/ 


MW |HR10] 
IDC IGRU12] 


0(d^/7Ilog(fe)/e) 
d{dy^n\og{k)/e) 


X 


/ 


/ 


JL 


0(log(fc)/e2) 


/ 


X 


/ 



Table 2: Comparison between mechanisms for answering directional variance queries. 

Therefore, we have that in order to preserve (e, (5)-differential privacy, it is enough to add a random 
Gaussian noise of AA(0, ^l^^i^Q.) to each of the cP entries of A^A. 

Next we give the utility guarantee of the Randomized Response scheme. Fix any unit length 
vector X. We think of the matrix we output as A'^A + N, where is a matrix of iid samples from 
AA(0, Therefore, in direction x, we add to the true answer a random noise distributed like 

x^Nx ~ AA(0, (^^ij ^i^j^ log(ci) _ j\/'(^o, °'^°sW gQ w.h.p the noise we add is within factor of 
0{\fd/e) for each query, and for k queries it is within factor of 0{\/ d\og{k) / e) . 

4.2.4 The Multiplicative Weights Mechanism 

It is not straight-forward to adapt the Multiplicative Weights mechanism to answer directional 
variance queries. We represent A as a histogram over its (fi entries (so the size of the "universe" 
is N = (P), but it is not simple to estimate what is the equivalent of number of individuals in this 
representation. We chose to take the pessimistic bound of ncP, since this is the Li bound on the 
sum of entries in A'^A, but we comment this is a highly pessimistic bound. It is fairly likely that 
the number of individuals in this representation can be set to only 0{(P). 

Plugging these parameters into the utility bounds of the Multiplicative Weights mechanism, 
we get a utility bound of 0{d^/nlog{k)/€). Plugging them into the improved bounds of the IDC 
mechanism, we get 0{dy/ n log(/c) /e) . Observe that even if replace the pessimistic bound of iid"^ 
with just d^, these bounds depend on d. 

5 Discussion and Open Problems 

The fact that the JL transform preserves differential privacy is likely to have more theoretical and 
practical applications than the ones detailed in this paper. Below we detail a few of the open 
questions we find most compelling. 

Error depedency on r. Our algorithm projects the edge-matrix of a given graph on r random 
directions, then publishes these projections. The value of r determines the probability we give a 
good approximation to a given cut-query, and provided that we wish to give a good approximation 
to all cut-queries, our analysis requires us to set r = But is it just an artifact of the analysis? 

Could it be that a better analysis gives a better bound on r? It turns out that the answer is "no" . 
In fact, the direction on which we project the data now have high correlation with the published 
Laplacian. We demonstrate this with an example. 
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Assume our graph is composed of a single perfect matching between 2n nodes, where node i is 
matched with node n + i. Focus on a single random projection - it is chosen by picking ( ^) iid 
random values Xij ~ AA(0, 1), and for the ease of exposition imagine that the values of the edges 
in the matching are picked first, then the values of all other pairs of vertices. Now, if we pick 
the value Xi^n+i for the {i, n + i) edge, then node i is assigned Xi^n+i while node n + i is assigned 
—Xi^n+i- So regardless of the sign of Xi^n+i, exactly one of the two nodes {i,n + i} is assigned the 
positive value and exactly one is assigned the negative value —\xi^n+i\- Define S as the set 

of n nodes that are assigned the positive values and S as the set of n nodes that are assigned the 
negative values. The sum of weight crossing the {S, S)-cut is distributed like {X + ^Y)^ where 
X = |xi,n+i| and Y = Yli'^j^^n+i^hj- Indeed, Y is the sum of n{n — 1) random normal iid 
Gaussians, but X is the sum of n absolute values of Gaussians. So w.h.p. both X and Y are 
proportional to n. Therefore, in the direction of this particular random projection we estimate the 
(5, S')-cut as Q{[n it w)^) = il(n^) rather than 0{n). (If X was distributed like the sum of n iid 
normal Gaussians, then the estimation would be proportional to {^/nf' = n.) 

Assuming that the remaining r — 1 projections estimate the cut as 0{n), then by averaging over 
all r random projections our estimation of the {S, <S)-cut is u){n), as long as r = o(n). 
Error amplification or error detection. Having established that we do err on some cuts, we 
pose the question of error amplification. Can we introduce some error-correction scheme to the 
problem without increasing r significantly? Error amplification without increasing r will allow us 
to keep the additive error fairly small. One can view L as a coding of answers to all 2" cut-queries 
which is guaranteed to have at least 1 — u fraction of the code correct, in the sense that we get 
a {r], r)-approximation to the true cut-query answer. As such, it is tempting to try some self- 
correcting scheme - like adding a random vector x to the vector I5, then finding the estimation to 
x^Lqx and (1^ + xY L^x and inferring l^L^l^. We were unable to prove such scheme works due 
to the dot-product problem (see next paragraph) and to query dependencies. 

A related question is of error detection: can we tell whether L gives a good estimation to a cut 

query or not? One potential avenue is to utilize the trivial guess for ^g{S) - the expected value 

j^s{n — s) (we can release m via the Laplace mechanism). We believe this question is related to 
V2 j 

the problem of estimating the variance of {$g(S') : |5| = s}. 

Edges between S and T. Our work assures utility only for cut-queries. It gives no utility 
guarantees for queries regarding E{S,T), the set of edges connecting two disjoint vertex-subsets 
S and T. The reason is that it is possible to devise a graph where both E(S, S) and E{T, T) are 
large whereas E{S, T) is fairly small. When E{S, S) and E{T, T) are big, the multiplicative error 
rj given to both quantities might add too much noise to an estimation of E{S,T). 

The problem relates to the dot-product estimation of the JL transform. It is a classical result 
that if M is a distance-preserving matrix and u and v are two vectors s.t. ||M(u -|- v)\\'^ w \\u + v|P 
and ||M(it — v)\\'^ ~ \\u — u|p then it is possible to bound the difference \Mu ■ Mv — u ■ v\. But 
this bound is a function of ||n|| and \\v\\, which in our case translates to a bound that depends on 
II^^gIsII and IIEgItIIi both vectors of potentially large norms. 

Other Versions of JL. The analysis in this works deals with the most basic JL transform, using 
normal Gaussians. We believe that qualitatively the same results should apply for other versions 
of the JL transform (e.g., with entries taken in ij). However, we are not certain whether the 
same results hold for sparse transforms (see |DKS10] ). 

Low rank approximation of a given matrix. The work of |HR12] gives a differentially private 
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algorithm that outputs a low-rank approximation of a given matrix A, while adding additive error 
> min{y/d, y/n}. Our work, which introduces much smaller noise (independent of n and d), does 
not have such guarantees. Our algorithm could potentially be integrated into theirs. In particular, 
their algorithm is composed of two stages, and our technique greatly improves the first of the two. 
The crux of the second stage lies in devising a way to preserve differential privacy when multiplying 
a given (non-private) X with a private database A without introducing too large of an additive 
noise. Matrix multiplication via random projections might be such a way. 

Integration with the Multiplicative Weights mechanism. When the interactive Multiplica- 
tive Weights mechanism is given a user's query, it considers two possible alternatives: answering 
according to a synthetic database, or answering according to the Laplace mechanism. It chooses the 
latter alternative only when the two answers are far apart. Its utility guarantees rely on applying 
the Laplace mechanism only a bounded number of times. An interesting approach might be to 
add a third alternative, of answering according to the perturbed Laplacian we output. Hopefully, 
if most updates can be "charged" to answers provided by the perturbed Laplacian, it will allow us 
to improve privacy parameters (noise dependency on n). 
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A Facts from Linear Algebra 

Below we prove the various facts from linear algebra that were mentioned in the body of the paper. 
We add the proofs, yet we comment that they are not new. In fact, existing literature |HJ901[Bha07] 
have documented proofs of the general theorems from which our facts are derived. Throughout 
this section, we denote the i-th eigenvalue (resp. the i-th singular value) of a given matrix M in a 
descending order, assuming all eigenvalues are real, as evi{M) (resp. as svi{M)). 

Proving Fact 13.51 The fact uses the max-min characterization of the singular values of a matrix. 

Theorem A.l (Courant-Fischer Min-Max Principle). For every matrix A and every 1 < i < n, 
the i-th singular value of A satisfies: 

svi{A) = max min {Ax,x) 

5:dim(S)=j x^S: ||a;|j=l 

Claim A. 2 (Weyl Inequality). Let A and B be positive semidefinite matrices s.t. the matrix 
E = B — A satisfies x^ Ex > for every x. Then for every 1 < i < n it holds that 

sViiA) < svi{B) 

Proof. Let Sa be the z-dimensional subspace s.t. svi{A) = min^g_5^. x). For every x ^ Sa 

we have that 

{Ax, x) = {Ax, x) + < {Ax, x) + {Ex, x) = {Bx, x) 
so svi{A) < min^-gg^. \^^\^^i{Bx,x). Thus svi{B) = max^min^gg. i^i^^\\^i{Bx,x) > svi{A). □ 

Fact 13.51 is a direct application of Claim IA.2I to Lq and Lqi . 
Proving Fact 13.61 The proof builds on the following two claims. 

Claim A. 3. Let A he a positive-semidefinite matrix. Lf x~'^Ax > x^x for every x G {Ker{A))-^ then 
it also holds that x^A^x < x~'^x for every x E {Ker{A))-^ . 

Proof. Denote the SVD oi A = VTi^V^ = Yll=i ^f'^i'^J^ where Vi is the i-th column of V . Fix 
X G {Ker{A))-^ and observe that x is span by the same r vectors {vi,V2, ■ ■ ■ ^Vr}, so we can write 
X = Yl\=i^i^i- Denote y = VT^'^V^x = X^[=i We have that y = ^l=i o.iO'i^^ Vi so 
y £ {Ker{A))-^. Therefore y^Ay > y^y, but y^y = x^A'^x and y^Ay = x^x. □ 
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Claim A. 4. Let A and B he two positive-semidefinite matrices s.t. Ker(A) = Ker{B). Then if 
for every x we have that Ax < x^Bx then x~'^A^x > x^B^x. 

Proof. We denote the SVD A = V'E'^V'^ and B = WlPW^. Because we can spht any vector x 
into the direct sum x = xq + x± where xq £ Ker{A) = Ker{B) and x_\_ € {Ker{A))-^ , and since 
we have that the required inequahty holds trivially for xq, then we need to show it holds for 
Given any z G {Ker{A))^ , set y = VTi~^V^ z. We know that y^ Ay < y^By, and therefore 

z'^z = y^Ay < y^By = z'^ {VT.-'^V^WU^W^VT.-'^V^) z =^ z'Cz 

The above proves that C is a positive semidefinite matrix whose kernel is exactly Ker{A) = Ker{B), 
and so it follows from Claim IA.3I that z'^z > z'^CU. Let I\ 

Ker{c)i. be the matrix which nullifies 
every element in KeriC), yet operates like the identity on {Ker{C))-^ . One can easily check that 
Ct = VT.V^WIi-^W^Vi:V^ by verifying that indeed C^C = CC^ = I\Ker{C)±- So now, given x 
we denote z = VTi^^V~^x and apply the above to deduce x^ B'^x = z'^C'^ z < z^z = x^A^x. □ 

Fact 13.61 is a direct application of Claim IA.4I to Lq and Lc ■ 



Proving Fact 14.41 Much like Claim IA.2I follows from the Courant-Fischer Min-Max principle, 
Lindskii's theorem follows from a generalization of this principle. 

Theorem A. 5 (Wielandt's Min-Max Principle.). Let A be a n x n symmetric matrix. Then for 
every k and every k indices 1 < ii < i^ < ■ ■ ■ < ik < n we have that 

k k 

e-Vi^ ^^) = or- ^r"^ ^ ' ^ 

iiiCJ2C...C Jfc Xj£bj: '—r 

1~ dim(5j)=ij Xj orthonormal ^~ 

Claim A. 6 (Linskii's theorem.). Let A and B be a n x n symmetric matrix. Denote E = B — A. 
Then for every k and every k indices 1 < ii < i2 < ■ ■ ■ < ik 1^ n we have that 

k k k 

evi^ (B) < Y evi^ (^) + Yl ^^*(^) 
j=i j=i 1=1 

Proof. Fix ii < i2 < ■ ■ ■ < ik and let Ti, . . . , the subspaces for which 

k k 

Xj orthonormal 

For every vi,V2, ■ ■ ■ ,Vk orthonormal we have that ^j=i(-Buj, Vj) = '^j=i{Avj,Vj)+Y^j^i{Evj, vj) < 
Ej=i(^^i>^i> +Yli=ieviiE), so 

k k k 

Y ^'"h (^) ^ ™^ Y ^^^J ' ^i) + X] 

^~ Xj orthonormal ~ *~ 
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and clearly 



k k 



, evj.(A) = max min > (Axi,Xi) > min > (Axi,Xi) □ 

dim(S'j )=ij Xj orthonormal Xj orthonormal 

Now, Fact 14.41 follows from Claim lA^ and from the following observation of WeilandtH Given 
a m X n matrix M, the matrix N = (^^j is symmetric and has eigenvalues which are (in 

descending order) {svi{A), sv2{A), . . . ,sVmiA),0,0, ... ,0, -sVm{A), -sVm-i{A), ... , -svi{A)]. 



^We thank Moritz Hardt for bringing this observation to our attention. 
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